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language Once this scattered literature is assembled and organized, it becomes clear 
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areas favors a more balanced view. Properties of phoneme perception, for example, 
are often explained in terms of phoneme production, whereas these properties seem 
to recur in other kinds of perception, unrelated to articulation. At the suprasegmental 
level, the view that the listener refers what he hears to how he would say it is opposed 
by the different dynamic characteristics of speaking and listening, which are evident 
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The similarity between the antecedents of listening and the con- 
sequences of speaking have long led theorists to believe in their in- 
timate and even natural relationship. A tendency to conflate the two 
processes is found in articles on speech communication^ automatic 
speech recognition, foreign language learning, and child language. 

Once this scattered literature is assembled and organized, it becomes 
clear that the present emphasis is on conflation whereas the evidence 
in each of the four areas favors a more balanced view. Properties of 
phoneme perception, for example, are often explained in terms of 
phoneme production, whereas these properties seem to recur in other 
kinds of perception, unrelated to articulation. At the suprasegmental 
level, the view that the listener refers what he hears to how he would 
say it is opposed by the different dynamic characteristics of speaking 
and i stening, which are evident whenever a speaker adjusts his level 
to match a numerical or acoustic criterion, or to compensate for 
changes in sidetone, or to maintain intelligibility despite increasing 
noise. When the relations between speaking and listening are altered 
by pathology, case studies find further evidence for their relative 
independence in the normal individual. 



Some early thoughts on the relation of speaking to listening are as follows: 

"Lastly, I am to take notice that there is so great a communication 
and correspondency between the nerves of the Ear and those of the 
Larynx that whensoever any sound agitates the Brain, there flow 
immediately spirits towards the muscles of the Larynx, which duely 
dispose them to form a sound altogether like that which was just now 
striking the Brain. ... It will come to pass that the Brain, which 
thereby is often shaken in the same places, sends such a plenty of 
spirits through the nerves that are inserted in the muscles of the 
throat that at length they easily move all the cartileges which 
serve for that action, as ’tis requisite they should be moved to 
form sounds like those that have shaken the Brain." 

De Cordemoy, who wrote these words in 1668^, may have been the first man to 
conflate the processes of listening and speaking; he certainly was not the last. 
It is evident that speaking and listening are related but distinct processes. 
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and that an adequate theory of language behavior must take this into account. 
Nevertheless, most theories give great weight to the presumptive similarities 
between uttering speech and listening to it, while giving little weight to their 
differences. Perhaps this article will help to redress the balance by assembling 
the scattered literature which illustrates how one-sided our view has become. 



Where Conflation Occurs 

The view that listening involves speaking, or that speaking involves listen- 
ing, is prominent in such diverse areas as speech communication, at both segmental 
and suprasegmental levels, automatic speech recognition, foreign language learning, 
and child language. 

Speech Communication (Segmental Level) 

"The hearer matches ♦•he acoustic stimuli he receives against his own 
habits of muscular speech action, and identifies the incoming sound 
as corresponding to this or that of his own speech articulations. 

At both ends of a speech transmission, it is the muscular activity, 
not the acoustic character, which dominates the identification." (p. 609) 

Thus wrote Twaddell in 1952 and Rockett reasoned similarly in his Manual of 

Phonology. 1955 : 

"VTe may suspect that Jack’s speech transmitter is not completely 
quiescent just because at the moment he is broadcasting nothing. 

As he listens to Jill, his Speech Receiver is able to decode the 
signal partly because the incoming signal is constantly compared 
with the articulatory motions which Jack himself would have to make 
in order to produce an acoustically comparable signal." (p. 7) 

In short: speech is perceived by reference to articulation. Such is also 

the thrust of Liberman’s review of research on speech perception carried out 

at the Haskins Laboratories: "...articulatory movements and their sensory 

effect," he wrote, "mediate between the acoustic stimulus and the event we 

call perception."^ 

Delattre reached a similar conclusion in his survey of the acoustic corre- 
lates of consonants and vowels: 

"...s’il existe un invariant qui permette de distinguer un lieu 
d’ articulation consonantique d’un autre, il est plutot dans le 
geste articulatoire que dans le trait acoustique: la forme 
acoustique de la parole serait perque, non directement, mais 
indirecteroent par reference au geste articulatoire qui est le 
meme pour plusieurs valeurs acoustiques differentes. 
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Speech Communication (Supraseg mental Level). 

This point of view encompasses not only vowels and consonants; other in- 
vestigators contend that reference to articulation is similarly required for 
the perception of suprasegmental features. "Accent is sui gener ic depending 
for its perception on the kinesthetic sense," Jones wrote in 1932. "The 
listener refers what he hears to how he would say it. Thus he translates 
exteroceptor into proprioceptor sensations, the kinesthetic memory serving as 

stimulus." (p. 74) 

Ladefoged, Draper, and Whitteridge examined this claim in several electro- 
myographic studies that led them to conclude as follows in 1958: 

"[Statements about stress] are usually best regarded as statements 
about the speaker’s muscular behavior (or about the actions of the 
listener's muscles which would have to be made in order to produce 

similar sounds) . 

Fonagy (1966) similarly favors a motor theory of stress perception, citing 
the concurrence of Daniel Jones (1950) , Stetson (1951) , Schmitt (1924) , Laziczius 
(1961), Lehiste and Peterson (1959), and Jespersen (1932) whom he considers to be 

the first to state the position, as follows: 

"The hearer identifies himself with the speaker. As he generally 
perceives the utterance by a silent coarticulation of the sounds 
uttered (slightly innervating his speech organs) he is founding 
his estimate of intensity of each syllable on the effort spent on 
its production. "5 

The motor theory applies not only to the perception of contrastive stress in 
a complete utterance. Going beyond the reference to articulation for the resolu- 
tion of phonemic contrasts, Ladefoged and McKinney (1963) suggest that "the per- 
ceived loudness of [all] words which are within the normal speech range is largely 
dependent on... the physiological effort required to produce them." (p. 459) 

As in the case of loudness and stress, it has been contended that production 
enters into the perception of pitch and intonation. Galunov and Chistovich, in 
their 1966 review of the motor theory, find that the results of studies on pitch 
matching by Leont'ev (1959) give "conclusive demonstrations of the significance 
of the motor representation of signals for their evaluation." (p. 361) 

Thurlow (1963) addressing himself to the problem of the "missing fundamental," 
(a complex tone has a pitch corresponding to its fundamental even if the latter is 
suppressed) believes it is not missing at all: listeners covertly match the tones 

with phonation, then judge their own fundamental. 

The theme of analysis by synthesis is recurrent, too, in Lieberman's recent 

monograph on "Intonation, perception and language." Listeners often seem to 
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perceive intonation and stress by means of a process of analysis-by-synthesis," 
he writes, which they make use of their knowledge of the articulatory 
gestures that are involved in the production of speech."^ 

Automatic Speech Recognition 

Analysis-by-synthesis is not only the hypothesized recognition process in 
the listener, it is also the actual recognition strategy incorporated in some 

7 

recent approaches to automatic speech recognition. Whatever the merits of a 
motor theory of speech perception, the proposed strategy for mechanical recog- 
nition has the advantage that the prohibitively large dictionary of stored 
patterns required in a passive device whose capabilities approached those of a 
human listener is not required in the active device, since generative rules 
are stored in the memory of the analyzer, instead. However, Stevens (1960) sees 
ii strat.egy as simulating a basic feature of human recognition; 

"In the synthesis process, a representation of the signal at the 
articulatory level will certainly occur," he writes, "A similar 
representation may likewise exist at some stage during the 
reverse process of speech recognition." (p. 53) 

Second Language Learning 

The question whether recognition procedures should be active or passive 
is equally a matter of design in a second area of application, foreign language 
instruction, where it takes the form; should production of foreign language 
sounds be taught before their discrimination. Of course, the question may be 
resolved on other grounds, but the view that listening to a foreign language 
entails speaking seems just the reverse of Nelson Brooks’ dictum (1959), which 
characterizes the modem audiolingual method, that the learner should speak 
only on the basis of what he has heard. Nevertheless, Hockett seems to call 
for just such a reversal when he states in support of a motor theory of per- 
ception, that; "...in learning a foreign language, one has considerable diffi- 

,,8 

culty hearing correctly until one can also pronounce correctly. 

First Language Learning 

Finally, the view that listening entails speaking, and the complementary 

view that speaking entails listening, make repeated reference to a third area 

of inquiry, child language. Kozhevnikov and Chistovich write as follows in 

their 1966 book on speech articulation and perception; 

"On considering the formation of phonemic classification system in 
the process of speech development in a child, the motor theory stresses 
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the role of the effects of imitation of the perceived sounds. In 
the process of imitation are provided conditions that are favorable 
to the forming of the conditioned-reflex correlations between 
groups of sound signals and complexes of articulatory motion.... 

It is assumed that these conditioned reflex correlations play an 
important part also in the process of speech recognition by an 
adult . 

Liberman and colleagues seem to hold a related view: 

"We believe that in the course of his long experience with language, 
a speaker (and listener) learns to connect speech sounds with their 
appropriate articulations. In time, these articulatory movements 
and their sensory feedback (or, more likely, the corresponding 
neurological processes) become part of the perceiving process, 
mediating between the acoustic stimulus and its ultimate percep- 
tion. "10 

These views of speech development are not novel. Compare the immediately 
preceding quotations from this decade with the theory of de Cordemoy and that 
of Allport (1924) ; 

"...the baby utters the syllable da... He receives certain kin- 
aesthetic. . .and auditory sensations... Returning to the brain 
centers these afferent impulses are, or tend to be, redischarged 
through the same motor pathways as those used in speaking the 
syllable itself... If the ear-vocal reflexes have been suffi- 
ciently established for the sound of a word to call forth the 
response of articulating it, it is no longer necessary that the 
child himself, should speak the stimulating word. It may be 
spoken by another. 

Three centuries do not seem to have changed the position very much, except 

12 

for the premise that the connections are learned — and this has been challenged 
in part by Lieberman (1967a) . 



Why Conflate 

Apparently, the similarity between the antecedents of listening and the 
consequences of speaking have led theorists to believe in their intimate and 
even natural relationship. Indeed, their isomorphism together with their 
temporal and spatial overlap, are probably two of the main reasons for the 
current tendency to talk about listening in terms of speaking and vice versa. 
Two more reasons for this conflation should be mentioned before the counter- 
vailing view is set forth here. First, there is a close correlation between 
patterns of articulation and the acoustic patterns that they generate (Fant, 
1960). Therefore, the stimulus for speech perception can be described in 
either coordinate system. As it happened, articulatory phonetics began before 
acoust.io. phonetics, and the acoustic stimulus for perception is often described 
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in articulatory terms. Perhaps the practice of talking about the stimuli as if 
they were articulatory encouraged scientists to think of them as articulatory 
after all. 

A fourth factor which may be responsible for the predominance of models 
that conflate speaking and listening has been formulated by Galunov and 
Chistovich (1966) who find that a motor theory "permits the problems of speech 
recognition and control to be combined and thus enables a broader class of 
mathematical models and methods to be utilized in theoretical work." (p. 362) 
Liberman, Cooper, Studdert-Kennedy, Harris, and Shankweiler (1965) give a 
similar reason when they state: 

"It seems most unparsimonious to assume that the speaker-listener 
has two separate centers of equal status, one for encoding language 
and the other for decoding it. We would rather assume that there 
is only one center, with some kind of link between sensory and 
motor components." (p. 10) 



Against Conflation 

An examination of some arguments and findings that oppose the conflation 
of speaking and listening may begin with the "theoretical" considerations just 
cited; the correlation in time and form between listening and speaking, the 
tendency to describe the one in terms of the other, and the greater parsimony 
of one language center instead of two. 

As for the correlations between articulation, speech wave and perception, 
these do not necessarily lead to the integration of speaking and listening. 

As Pant (1967b) suggests, the consequence may be solely that: "auditory patterns 
would be structured rather similar to the patterns of motor commands" (p. 2) in 
two rather distinct systems. In fact, according to Penfield and Roberts (1959), 
there are two separate language centers on the word level for motor and sensory 
functions, not one. As for the remaining source of conflation, it need hardly 
be pointed out that to describe speech by reference to articulation is one 
thing, to perceive it by reference to articulation another. As Fant (1963) 
puts it: 

"The reference to articulation primarily serves a function within 
the metalanguage whereby we as outside observers may conveniently 
describe speech. But is it actually a part of speech perception? 

...The alternative view I would like to propose here is that if 
the auditory analysis in the hearing process has proceeded so far 
as to allow the proposed articulatory matching, the decoding 
could proceed without an articulatory reference." (p. 1) 
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Related evidence for a separatist view comes from clinical and develop- 
mental studies. Lenneberg (1962), Fuller (1966), and MacNeilage, Rootes, and 
Chase (1967) have all conducted case studies of patients with congenital 
impairments of speech production; the diagnoses included anarthria, aphasia, 
and cerebral palsy. Their conclusions were, respectively: 

"An organic defect prevented the acquisition of the motor skills 
necessary for speaking a language, but evidence was presented 
for the acquisition of grammatical skills as required for a 
complete understanding of language." (p. 424) 

"Neither babbling, imitation, or articulate speech is necessary 
for understanding the natural language." (p. 5) 

"Despite the severe speech production deficits, speech percep- 
tion approached normality, even in some characteristics which, 
according to the motor theory of speech perception, are depen- 
dent on the listener’s referring to the neural correlates of 
normal speech motor control. Reference to normal motor infor- 
mation does not therefore appear necessary for these types of 
perceptual performance." (p. 449) 

Lebrun (1968) discusses several other case reports of normal speech per- 
ception despite concurrent dysarthria, among them the reports of Bang and 
Palmer (1956) and Lhermitte, Gautier, Marteau, and Chain (1963). His own 
case study of a patient with "cortical anarthria without any concomitant 
dysphasic impairment" (1967) supplements the other studies in indicating 
that reference to articulatory movement is not required either at peripheral 
or at central levels, for normal speech perception. 

These various findings accord with Fant’s view (1967b) that "the capacity 
of perceiving distinctive auditory patterns on the subphonemic level [develops] 
in the early learning process prior to and not critically dependent of corre- 
sponding motor patterns." (p. 2) Similarly, Jakobson, in his book on child 
language (English translation 1968), makes reference to sounds discriminated 
by the child but not produced differently on the one hand, and to sounds that 
the child fails to discriminate which nevertheless appear as discretely different 
in his babbling repertoire on the other hand (p. 11 and fn 14). Also, the 
aphasic may produce relevant sounds but fail to discriminate them (p. 20). 

Second Language Learning 

Jakobson and Halle (1956) clearly favor a redress of the balance away from 
conflationist theories of speaking and listening when they write: 
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"The theoretically unlikely surmise of a closer relationship between 
perception and articulation than between perception and its immediate 
stimulus finds no corroboration in experience: the kinesthetic feed- 
back of the listener plays a very subordinate and incidental role." 

(p. 34) 

Their claim is clearly opposed to that of Twaddell, Liberman, and Rockett, cited 

at the outset, and they continue by disputing Rockett's example from foreign 

language learning. "Not seldom," they write, "do we acquire the ability to 

discern foreign phonemes by ear without having mastered their production and 

in a child's learning of language an auditory discrimination of adults' 

phonemes often precedes the use of these phonemes in his own speech." (p. 34) 

Sapon (1965a) also argues that "receptive productive language are 

functionally different, differ in antecedent learning conditions and make 

uniquely different demands on the organism": 

'‘The analysis of a language from the point of view of its production 
yields a much more complex system that an analysis based on its per- 
ception, yet the teaching of both production and perception is 
usually based on the analysis appropriate only to the latter .. .what 
is wanted is... a micro-analysis of the behavior of the speaker as 
distinguished from that of the listener. 

Automatic Speech Recognition 

If the problems of foreign language learning hardly provide univocal 

support for the conflation of speaking and listening, neither do those of 

automatic speech recognition. There are, of course, numerous strategies for 

recognition, some of which do not engage the issue, while yet others simulate 

more closely the dualist position, following so-called "passive'' strategies 

14 

for analysis and identification. In an article entitled "Passive vs active 
recognition models or is your homunculus really necessary," Morton and Broadbent 
(1967) express two "objections to active models of speech perception in general... 
these are firstly that the evidence quoted in their favor is not really incon- 
sistent with a passive explanation, and secondly that an alternative passive 
model is of greater generality." (p. 2) 

Speech Communication (Segmental Level) 

This first objection is also the thrust of a series of studies carried out 
by Lane and collaborators during r. cent years. The inference that speaking 
mediates listening has been argued experimentally, at the segmental level, from 
evidence for "categorical perception" — that is, from certain properties of 
identification and discrimination functions for S3mthetic speech continue. 
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However, Lane and others have obtained identification and discrimination functions 
* . 15 

with these properties along nonspeech continue *^oth visual and acoustic. Thus, 

Cross, Lane and Sheppard (1965) reason; 

"The postulation of a special perceptual mechanism for speech percep- 
tion is not warranted.,.. The relations among identification proba- 
bility, latency, topograpliy and discrimination accuracy on which the 
motor theory is based are not at all peculiar to speech perception 
but are, more broadly, the result of a rather general paradigm for 
discrimination training and testing." (p. 74) 

The motor theory of speech perception cannot adduce much support from the 
finding of categorical perception of speech sounds when it is shown that noises 
and visual patterns can be perceived categorically, too. Perhaps the most 
striking example of the generality of categorical perception, and hence of its 
irrelevance for conflating speaking and listening, comes from some studies of 
color perception that Lane and Kopp have been conducting. In order to illus- 
trate how closely the findings for color match those for speech, and how, there- 
fore, they constrain the interpretation of the speech results, we may juxtapose 
the findings for color perception, shown in Figure 1, with a description of the 
findings for speech perception, taken from an article on "a motor theory of 



Insert Figure 1 about here 

speech perception. Substituting color terms for speech terms, the article 
then says in paraphrase: 

"Although the [colors] lie on a [visual] continuum, the perception 
is essentially discontinuous. Because of the discrimination peaks 
at the [color-class] boundaries, the incoming [colors] are [seen] 
categorically. . .and they are therefore, quickly and accurately 
sorted into the appropriate [color-class]..." (p. 3) 

As a description of findings, the paragraph applies as well to color as it 
did to speech. Discrimination (dotted line) is indeed better across color 
boundaries than within color classes. The sorting of colors is indeed accurate, 
that is, unequivocal, since the step-like labelling curves tell us that the 
observer always or never assigns wavelength L to color C. And the colors are 
sorted quickly (broken line) as well as accurately especially the characteristic 

hues at the centers of the color categories. 

If the article's description of categorical perception applies to color and 
to speech sounds, the same may not be said concerning its int':*rpretation of 
categorical perception. Continuing the paraphrase; 
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"What kind of mechanism underlies the categorical perception of the 
[colors]. The answer seems to us... that the perception of [color] 
is tightly linked to the feedback from the speaker’s own articula- 
tory movements.... In time, these movements. .. come to mediate 
between the [color] and its ultimate perception," (p. 4) 

Clearly, we cannot account for diverse examples of categorical perception with 

a motor theory in the narrowest sense. However, a related and much more general 

mechanism was put forth independently by Chistovich, Klaas, and Alekin (1961), 

by Cross, Lane and Sheppard (1965) and, for different reasons, by Glanzer and 

Clark (1963), To quote the former authors: 

"The logical outline of discriminating sequences of sounds is as 
follows: the sound signals are transformed into the corresponding 
decisions. . .these decisions are remembered and are subsequently 
used in choosing a final outward reaction."^® 

Speech Communication (Suprasegmental Level) 

When a speaker judges the suprasegmental characteristics of his own speech, 

the possible sources of cues include airborne sound (air sidc^one), head side- 

tone, and proprioception^ Of course, when the speaker stops talking and listens 

to someone else instead, he is ueprived of most of these cues, and as a listener, 

he must base his suprasegmental judgments differently. 

Since the sensory characteristics of speaking and listening are thus seen 

19 

to be quite different structurally, it is not surprising to learn that they 
are quite different functionally. This is the conclusion of Lane and collabora- 
tors in a series of studies over the past seven years: autophonic scales (scales 
of the speaker’s perception of his own voice) and reception scales are system- 
atically different (Lane, 1962). Consid'=‘r loudness and stress, for example. 

The autophonic scale has an exponent roughly double that of the reception scale 
(Lane, Catania & Stevens, 1961). The disparity was established in dozens of 
replications of the magnitude scaling techniques and confirmed by the method 
of ^ross-modality validation (Stevens, 19-59), - Nevertheless, Ladefoged (1959) 
suggests that we judge loudness In terms of vocal level, and Warren (1962) 
suggests that we judge vocal j.ti\7el in terms of loudness. However, the recep- 
tion scale is not the autophonic scale and the autophonic scale is not the 
reception scale. In fact, when speakers vary autophonic level in order to 
match changes in the loudness of a criterion stimulus, or in order to compen- 
sate for changes in sidetone loudness, or in order to maintain intelligibility 
despite increasing noise, in all these tasks, about the same relation is found 



31 



11 



Lane 

between the dynamics or listening and speaking, and it is never the identity 
relation postulated by Ladefoged, Warren, and others. 

Matching a criterion . Since loudness grows about half as fast as auto- 
phonic level, a listener presented with a fourfold increase in sound intensity 
will match it with a twofold increase in vocal level. In other words, the two- 
to-one ratio between the slopes (exponents) of the autophonic and reception 
scales yields an equal-sensat . n function that is linear in decibel coordinates, 
with a slope of about one half. Accordingly, Lane, Catania and Stevens (1961) 
obtained slopes of .51 and .52 in two different studies using noise intensities 
as the criterion stimuli. They also turned up some unpublished studies of Black 
(1955) which, despite considerable procedural differences, gave fair agreement 
(slopes about .64). Lane (1962) obtained flatter matching functions (slopes 
.33 and .35) when he had subjects imitate two-syllable words with iambic or 
trochaic stress. Finally, Irwin and Mills (1965) observed that the disparity 
between autophonic and reception scales can be validated not only by measuring 
pairs of stimuli corresponding to equal sensations, as in the preceding studies, 
but also by measuring pairs of sensations corresponding to equal stimuli. When 
they had speakers produce various autophonic levels and listeners estimate their 
loudness, they found that the magnitudes given to the speakers and those reported 
by the listeners were related to each other as predicted (slopes .52 and .53 in 
two studies) . 

Compensating for sidetone . Since loudness grows about half as fast as auto- 
phonic level, a speaker presented with a fourfold increase in sidetone will 
restore his original loudness by halving his vocal level. In other words, the 
compensation function is the reciprocal of the matching function; both have 
exponents whose absolute value is about one half. This is just what Lane, 

Catania, and Stevens (1961) found when they manipulated the sidetone fed back 
to the speaker’s ears in an interphone system, and instructed each speaker to 
compensate foi changes in sidetone so as to hold the loudness of his voice 
constant, as he perceived it (obtained slope, -.46). However, subjects will 
compensate in this way without instruction or refj.ection; the same authors 
found that subjects asked to vocalize at a medium level produce a sound 
pressure that varies inversely with the sidetone gain (slope, -*42) . Similarly, 
the unpublished measurements of Licklider and Kryter (1944) at 35,000 and 5,000 ft., 
and those of Lightfoot and Morrill (1949) on the ground, show that the changes in 
vocal level caused by inverse changes in interphone gain are governed by the 
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disparity between the dynamics of speaking and listening (slopes -.45, -.43, and 
-.40, respectively). Gardner's findings (1966) are consistent with w.hose pre- 
ceding, although they embrace a wider range of slopes. Black (1951) reduced 
effective sidetone below normal by producing temporary threshold shifts through 
prolonged exposure to noise, then measured average vocal level (slope -.47). 
McKown and Emling (1933) found that increases in telephone sidetone at the 
transmitter produced decreases approximately half as great (in decibel units) 
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sidetone was i ..ireased 10 db, his listeners asked him to repeat what was said 
about as often as when the volume in their receivers was decreased 5 db. It 
is not surprising that a 10 db increase in the speaker's sidetone had the same 
effect as a 5 db decrease in the listener's volume, since a 10 db increase in 
sidetone in fact produces about a 5 db decrease in volume — according to the 
ratio of the autophonic and reception scales. Not unexpectedly, the compensa- 
tion function tends to be flatter when sidetone is changed for only one ear and 
is left unchanged for the other ear: slopes between -.3 and -.4 are reported 
under these conditions by Lane, Catania and Stevens (1961), Fletcher, Raff and 
Parmley (1918) , and Noll (1964a, 1964b) . 

Compensating for noise . Since loudness grows about half as fast as auto- 
phonic level, a speaker confronted with a fourfold increase in ambient noise 
will restore the perceived signal-to-noise ratio by a twofold increase in his 
vocal level. This is what Webster and Klumpp found in 1962 (slope .5) when 
they instructed their speakers to maintain intelligibility despite increasing 
noise. As in the case of compensation for sidetone, a smaller but comparable 
adjustment seems to t made automatically. Indeed, these compensatory adjust- 
ments are sometimes called the Lombard Reflex (Sullivan, 1963) , after the 
French doctor who used the phenomenon to detect malingering early in this 
century (Lombard, 1911). Noise compensation functions with slopes between .3 
and .4, over low to moderate noise levels, have been reported by Kryter (1946), 

Hanley and Steer (1949), Korn (1954), Pickett (1958), Dreher and O'Neill (1958), 
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and Gardner (1964, 1966). It may be that both kinds of compensation, sidetone 
penalty and the Lombard Reflex, reflect an unconscious effort by the speaker to 
keep the signal-to-noise ratio, and hence his intelligibility, nearly constant. 
When the speaker has the impression that he has succeeded in this effort, the 
listener has the impression that he has not— because of the differences between 
speaking and listening. 
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The sensory dynamics of listening and speaking are not only different, as 
shown above, but also, it seems from experiments in which their normal correla- 
tions are disturbed, they are not much related causally. When autophonic level 
is varied from whispering at the one extreme to shouting at the other, for ex- 
ample, the quality of the voice inevitably varies but the listener’s judgments 
of loudness are practically unaffected by these variations (Lane, 1962). On 
the other side of the coin, one-hundred fo3*d amplification or attenuation of 
sidetone level, or total masking of sidetone, leaves the autophonic scale 
essentially unchanged. Moreover, the speaker readily maintains a constant 
vocal level, when instructed to do so, despite changes in sidetone level as 
great as 80 db (Lane, Catania & Stevens, 1961). Furthermore, Lane has shown 
that congenitally deaf speakers perceive their own vocal level just as normal 
speakers do; they give the same autophonic scale (Lane, 1963) . 
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Fairbanks (1954) described the "speaking system" as a servosystem, empha- 
sizing the control function of auditory feedback: "When I say a word and you 
repeat it, your hearing apparatus measures my word for purposes of estimation 
and then your word (the same word) for purposes of control" (p. 135) but the 
preceding evidence favors a verdict of independent over one of interdependence. 
To add to that evidence, here are the conclusions of Kozhevnikov and Chistovich 
(1966) about the control of syllable initiation: 

"Thus, it is necessary to exclude the hypothesis that acoustic changes 
connected with the beginning of a syllable are used by the nervous 
system as necessary signals for producing the command for the begin- 
ning of the next syllable. . .proprioceptic impulsation also is not the 
necessary Jignal. . .each following syllable command is given auto- 
matically; to produce the command it is not necessary to use the 
afferent impulsation which occurs upon the accomplishment of the pre- 
ceding signal." (pp. lOlff) 

Further evidence that speaking does not depend on listening in a "closed 
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loop" fashion comes from recent research on delayed auditory feedback (DAF) 
by Chase, Cullen, Niedermeyer, and Blumer (1967). Even though the disruptive 
effects of DAF are sometimes cited in support of a conf lationist view of speak- 
ing and listening, the disruption of speech by exaggerated values of auditory 
feedback does not demonstrate that speech depends on this feedback. These in- 
vestigators found that the effects of DAF are cut off during psychomotor seizure 
but normal speech may continue. They conclude that "speech can be elaborated 
without closed- loop auditory-feedback control." 
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Footnotes 



^According to Lindgren (1956) . 

2 

Liberman (1957; p. 122). This hypothesis is based experimentally in 
part cn the properties f identification and discrimination functions for 
synthetic speech continue obtained in the following studies; I^iberman, Harris, 
Hoffman, & Griffith (1957); Liberman, Harris, Kinney, & Lane (1961); Liberman, 
Harris, Eimas, Lisker, & Bastian (1961); Fry, Abramson, Eimas, & Liberman (1962); 
Eimas (1963). 

Further discussion of these findings, advocating a motor theory of speech 
perception, will be found in: Liberman (1957) ; Lisker, Cooper, & Liberman 

(1962); Studdert-Kennedy , & Liberman (1963); Liberman, Cooper, Harris, & 

MacNeilage (1963); Liberman, Cooper, Harris, MacNeilage, & Studdert-Kennedy 
(1967); Liberman, Studdert-Kennedy, Harris, & Cooper (1965); Liberman, Cooper, 
Studdert-Kennedy, Harris, & Shankweiler (1965); Cooper (1965); Liberman, Cooper, 
Studdert-Kennedy, Harris, & Shankweiler (1968). 

3 

Delattre (1958; p. 228). According to this line of reasoning for the 
motor theory, invariances in the perception of speech are more closely matched 
by articulatory than by acoustic invariances. Thus, the theory has a 
second experimental basis in certain studies of the perception of synthetic 
speech se^w^nts in which small differences in articulation cause large 
differences at the acoustic level, or conversely, viz.: Liberman, Delattre, 

& Cooper (1952); Liberman, Delattre, Cooper, & Gerstman (1954); Delattre, Liberman, 

& Cooper (1955). 

Other statements of this line of argument will be found in: Liberman 

(1957); Cooper, Liberman, Harris, & Grubb (1958); Lisker, Cooper, & Liberman 
(1962); Liberman, Cooper, Harris, MacNeilage, & Studdert-Kennedy (1967); 

Liberman, Cooper, Studdert-Kennedy, Harris, & Shankweiler (1965); but see the 
alternate interpretation offered by Fant (1967a) . A mo tor- invariance theory of 
vowel perception will be found in Joos (1948). The only comprehensive review of 
the experimental bases of the motor theory is: Lane (1965a). 

In a recent review of the invariance question, Delattre seems to have 
changed his 1958 position somewhat: "It would seem that the acoustic correlate 
is closer to linguistic perception than is the articulatory correlate." (1967; p.23) 

^adefoged. Draper, & Whitteridge (1958; p. 9). Also see Draper, Ladefoged, 

& Whitteridge (1959). 
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^Cited in Fdnagy (1966; p. 238). And in Fonagy .1958): "A return to the 

original physiological conception of accent cJassically formulated by Otto 
Jespersen is unavoidable since accent is simply not to be defined on an acoustic 
level.” (p. 55) Also see Gandhi, Peterson, & Yu (1960): "There is considerable 

reason to believe that the human observer interprets meaningful sounds in terms 
of the various properties of the source rather than according to the acoustical 
dimensions and magnitudes of the sounds." (p. 141) 

fi 

Lieberman (1967a; p. 162). Also see: Lieberman (1967b; 1968). 

^Halle & Stevens (1959; 1962); Stevens & Halle (1967). 

O 

Hockett (1955; p. 7). For a discussion of the relations between production 
and perception in foreign language learning, see Lane (1964).; for studies of 
their sequencing, see Mace & Keislar (1965); Mace (1966) and Butt (1966). 

^Kozhevnikov & Chistovich (1966; p. 203-204). Also see Prins (1963); 
Chistovich (1961); Liberman, Harris, Eimas, Lisker, & Bastian (1961); Sherman 
& Geith (1967). 

^^Liberman, Harris, Eimas, Lisker, & Bastian (1961; p. 177). Similarly, 
Rutherford (1967) writes: "The infant’s auditory impressions of the sounds 

[he makes randomly] become linked with the coincidental patterns of tactile 
and proprioceptive sensation arising from the tongue, lips, and other articu- 
lators. By 8-10 months, he has heard and felt himself say 'muhmuhmuh' several 
thousand times and the syllables ’mama’ spoken by Mother are a stimulus suf- 
ficient to evoke the nearly equivalent gesture from his own speech mechanism." 

(p. 249) 

^^Allport (1924; p. 182-183). A year before. Smith & Guthrie wrote: "The 

dependence of imitation in learning is well illustrated by language acquisition... 
the sounds [that the baby] makes accompany the movements that produce them 
and, because the vowels are sustained and the consonants either sustained or 
repeated, these sounds also precede the movements that continue or reiterate 
them. They thus become the conditioning stimuli for their own production, 
so that when uttered by others, they are imitated by the baby." (pp. 132 ff.) 

Husson and colleagties claim that there is a demonstrable "cochleo-laryngeal" 
reflex, mediated by the medulla oblongata (Vannier, Saumont, Labarraque, & 

Husson, 1954) , but several investigators have failed to substantiate the claim; 
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one of these (Arslan, 1964) speculates that the electrical discharges in the recur 
rent nerve observed after intense sound are the consequence of general reticular 
activation, since the discharges are also triggered by painful stimulation and 
are accompanied by electrical activity in various somatic muscles. 

12 

For a mort^ sophisticated hypothesis concerning the basic units in the child’s 
auditory-vocal learning, see Skinner’s concept (1957) of the "minimal echoic 
repertoire. " 

13 

Sapon (1965b; p. 136). In what we may consider a second-language experiment. 
Denes (1967) found no difference in learning to recognize quasi-novel speech 
sounds between those subjects who were and those subjects who were not able to 
associate the sounds with patterns of articulation. Pollack & Johnson (1959) 
had a similar unconfirmed expectation of the motor theory; they found that 
associating distinctive motor responses with elements of an auditory display did 
not enhance reproduction and identification of the elements. 

14 

For example, Hemdal & Hughes (1967). 

^^Lane (1965a, 1965b, 1966, 1968a, 1968b); Cross, Lane, & Sheppard (1965). 

Also, using an acoustic-phonetic continuum that was effectively nonspeech for an 
aphasic patient (who was also dysarthric) , Lane & Moore (1962) established 
identification and discrimination functions after 15 minutes of r€rconditioning. 

See also the reply to Cross, Lane, & Sheppard by Liberman, Studdert-Kennedy , 
Harris, & Cooper (1965). For a description of the conditioning paradigm that 
yields categorical perception, see Lane (1968b), and Cross & Lane (1962). 

16 

Kopp & Lane (1967); Kopp (1967); Wilson & Lane (1967); Lane (1968c). 

17 

Liberman, Cooper, Harris, & MacNeilage (1963). 

18 

The form of the discrimination functions and the correspondence between 
obtained discrimination and that predicted from the identification functions can 
be accounted for by this hypothesis, without restriction to a specific sensory 
domain, on the assumption that the "decisions" involved are described by the 
identification functions. The degree of correspondence then depends in part on 
whether experimental parameters, such as the inter-stimulus interval, facilitate 
"deciding, remembering, and choosing." 

19 

Ringel, Saxman, & Brooks (1967); Ringel & Fletcher (1967); Ringel & Ewan- 
owski (1965); Kirikae, Sato, Oshima, & Hirose (1961); Von B^kesy (1962, 1949). 
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^°Melnick's procedures and findings are at variance with those reported 
here (1965); he suggests that suitable corrections for the marked differences 

in method would bring the results into alignment. 

^^Black (1950) , Atkinson (1952) , and Alpert (1965) obtained flatter 
functions, in the former cases probably because the masking signals were pure 

tones. 

“aIso see Fletcher (1953), Fry (1954); Peterson (1955). 

^^Chase, Sutton, & First (1959); Yates (1963). 
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Fig. 1. Distributions of 
identification probability and 



Figure Caption 

discrimination accuracy (Correct ABX) and 
latency for the hue continuum (from Lane, 1966) . 
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